MS 5.54 - pg 215 Phishing attacks to email accounts.
Refer to the Chance (Summer, 2007) article on phishing attacks at a
company, Exercise 2.24 (p. 38). Recall that phishing describes an
attempt to extract personal/financial information through fraudulent
email. The company set up a publicized email account—called a “fraud
box”—which enabled employees to notify them if they suspected an email
phishing attack. If there is minimal or no collaboration or collusion
from within the company, the interarrival times (i.e., the time between
successive email notifications, in seconds) have an approximate
exponential distribution with a mean of 95 seconds.
0+exp(-120/95)
## [1] 0.2827597
phishing <- read.csv("PHISHING.csv")
hist(phishing$INTTIME, xlab = "INTTIME", main = "Histogram or phishing")
Yes, this distribution looks like a exponential distribution due to the shape.
MS 5.56 - pg 215 Flood level analysis. Researchers have
discovered that the maximum flood level (in millions of cubic feet per
sec- ond) over a 4-year period for the Susquehanna River at Harrisburg,
Pennsylvania, follows approximately a gamma distribution with \(\alpha\) = 3 and \(\beta\) = .07 (Journal of Quality
Technology, Jan. 1986).
alpha = 3
beta = 0.07
mean = alpha*beta
var = alpha*(beta)^2
mean
## [1] 0.21
var
## [1] 0.0147
# High end
mean+3*sqrt(var)
## [1] 0.5737307
# Low end
mean-3*sqrt(var)
## [1] -0.1537307
This event could happen, but it would be extremely rare, definitely an outlier case on the high end.
MS 5.60 - pg 216 Reaction to tear gas. The length of
time Y (in minutes) re- quired to generate a human reaction to tear gas
formula A has a gamma distribution with \(\alpha\) = 2 and \(\beta\) = 2. The dis- tribution for formula
B is also gamma, but with \(\alpha\) =
1 and \(\beta\) = 4.
alpha1 = 2
beta1 = 2
alpha2 = 1
beta2 = 4
mean1 = alpha1*beta1
mean2 = alpha2*beta2
mean1
## [1] 4
mean2
## [1] 4
var1 = alpha1*(beta1)^2
var2 = alpha2*(beta2)^2
var1
## [1] 8
var2
## [1] 16
For A:\[P(Y<1) = \int_{0}^{1} \frac{y^{2-1}e^{-y/2}}{2^2\Gamma(2)} dy = \int_{0}^{1}\frac{ye^{-y/2}}{4} dy\] \[= \frac{1}{4}(-2ye^{-y/2})\Big|_{0}^{1} + \frac{1}{4}\int_{0}^{1}2e^{-y/2} dy\] \[= \frac{1}{4}(-2e^{-1/2} + 2(0)e^0)+(-e^{-y/2})\Big|_{0}^{1} = -0.3033 -e^{-1/2}+e^0\] \[-0.3033-0.6065+1=0.0902\] For B:\[P(Y<1) = \int_{0}^{1} \frac{y^{1-1}e^{-y/4}}{4^1\Gamma(1)} dy = \int_{0}^{1}\frac{e^{-y/4}}{4} dy\] \[= -e^{-y/4}\Big|_{0}^{1} = -e^{1/4} + e^0\] \[= 1-0.7788 = 0.2212\]
Because 0.2212 > 0.0902. This means that Formula B has a higher probability of generating a human reaction in less than 1 minute
MS 5.74 - pg 219 Washing machine repair time. Based on
extensive testing, a manufacturer of washing machines believes that the
distribution of the time Y (in years) until a major repair is required
has a Weibull distribution with \(\alpha\) = 2 and \(\beta\) = 4.
1-exp(-2^2/4)
## [1] 0.6321206
mean = 4^{1/2}*gamma(3/2)
mean
## [1] 1.772454
\[\sigma^2=\beta^{2/\alpha}[\Gamma(\frac{\alpha+2}{\alpha})-\Gamma^2(\frac{\alpha+}{\alpha})]\]
sigmaSquared = 4^(2/2)*((gamma((2+2)/2))-(gamma(3/2))^2)
sigma = sqrt(sigmaSquared)
sigma
## [1] 0.9265028
# High range
high = mean+2*sigma
high
## [1] 3.625459
# Low range
low = mean-2*sigma
low
## [1] -0.08055165
(1-exp(-high^2/4))-(1-exp(-0^2/4))
## [1] 0.9625964
No the Probability is extremely small.
1-(1-exp(-6^2/4))
## [1] 0.0001234098
MS 5.84 - pg 223 Laser color printer repairs. The
proportion Y of a data- processing company’s yearly hardware repair
budget allo- cated to repair its laser color printer has an approximate
beta distribution with parameters \(\alpha\) = 2 and \(\beta\) = 9.
mean = (2/(2+9))
mean
## [1] 0.1818182
\[\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\]
var = (2*9)/((2+9)^2*(2+9+1))
var
## [1] 0.01239669
1-1/beta(2,9)*(-0.4*(1-0.4)^9/9-(((1-0.4)^10/(9*10))-(1)^10/(9*10)))
## [1] 0.0463574
1/beta(2,9)*(-0.1*(1-0.1)^9/9-(((1-0.1)^10/(9*10))-(1)^10/(9*10)))
## [1] 0.2639011
MS 5.114 - pg 232 Lifelengths of memory chips. The
lifelength Y (in years) of a memory chip in a laptop computer is a
Weibull random variable with probability density \[
f(y) =
\begin{cases}
\frac{1}{8}ye^{-y^2/16} \quad \text{if } 0\leq y < \infty\\
0 \quad \quad \quad \quad \quad \quad \text{elsewhere}
\end{cases}
\]
alpha = 2
beta = 16
alpha
## [1] 2
beta
## [1] 16
mean = beta^{1/alpha}*gamma((alpha+1)/alpha)
var = beta^{2/alpha}*(gamma((alpha+2)/alpha)-(gamma((alpha+1)/alpha))^2)
mean
## [1] 3.544908
var
## [1] 3.433629
1-(1-exp(-(6)^alpha/beta))
## [1] 0.1053992
MS 6.2 - pg 239 Tossing dice. Consider the experiment of
tossing a pair of dice. Let X be the outcome (i.e., the number of dots
appearing face up) on the first die and let Y be the outcome on the
second die.
(1/6)*(1/6)
## [1] 0.02777778
\[p(x, y) = \frac{1}{36}\]
\[\frac{1}{36}+\frac{1}{36}+\frac{1}{36}+\frac{1}{36}+\frac{1}{36}+\frac{1}{36}= \frac{6}{36} = \frac{1}{6}\]
\[p_2(x) = P(X=2) = p(2,1)+p(2,2)+p(2,3)+p(2,4)+p(2,5)+p(2,6)\] \[\frac{1}{36}+\frac{1}{36}+\frac{1}{36}+\frac{1}{36}+\frac{1}{36}+\frac{1}{36}= \frac{6}{36} = \frac{1}{6}\]
When y = 1 \(p_2(x | 1)=\frac{p(x,1)}{p_2(1)}\) \[p_1(1|1)=\frac{p(1,1)}{p_2(1)}=\frac{1/36}{1/6}=\frac{1}{6}\] And so on for other \(p_1(x | y)\) … \[\frac{1}{6}\] Meaning \[p_1(x|y) = \frac{1}{6}\] Same probability for \(p_2(y|x)\) because \(1 \leq X \leq 6\) and \(1 \leq Y \leq 6\) (Both sides are equal).
\(p_1(x) = p_1(x|y)\) which implies that x and y are independent.
MS 6.4 - pg 240 Modeling the behavior of granular media.
Refer to the Engineering Computations: International Journal for
Computer-Aided Engineering and Software (Vol. 30, No. 2, 2013) study of
the properties of granular media (e.g., sand, rice, ball bearings, and
flour), Exercise 3.62 (p. 120). The study assumes there is a system of N
non-interacting granular particles, where the particles are grouped
according to energy level, r. For this problem (as in Exercise 3.62),
assume that N = 7 and r = 3, then consider the scenario where there is
one particle (of the total of 7 particles) at energy level 1, two
particles at energy level 2, and four particles at energy level 3.
Another feature of the particles studied was the position in time where
the particle reached a certain entropy level during compression. All
particles reached the desired entropy level at one of three time
periods, 1, 2, or 3. Assume the 7 particles had the characteristics
shown in the table. Consider a randomly selected particle and let X
represent the energy level and Y the time period associated with
particle.
knitr::include_graphics("Q8.png")
| 1 | 2 | 3 | ||
|---|---|---|---|---|
| x | 1 | 1/7 | 2/7 | 1/7 |
| y | 2 | 0 | 0 | 2/7 |
| 3 | 0 | 0 | 1/7 |
| x | 1 | 2 | 3 |
|---|---|---|---|
| \(p_1(x)\) | 1/7 | 2/7 | 4/7 |
| y | 1 | 2 | 3 |
|---|---|---|---|
| \(p_2(y)\) | 4/7 | 2/7 | 1/7 |
When \(X=1\), \(p_2(y|1)=\frac{p(1,y)}{p_1(x)}\) \[p_2(1|1) = \frac{p(1,1)}{p_1(1)} = \frac{1/7}{1/7}=1\] \[p_2(2|1) = \frac{p(1,2)}{p_1(1)} = \frac{0}{1/7}=0\] \[p_2(3|1) = \frac{p(3,1)}{p_1(1)} = \frac{0}{1/7}=0\]
| y | 1 | 2 | 3 |
|---|---|---|---|
| \(p_2(y|1)\) | 1 | 0 | 0 |
Same reasoning with when \(X=2\) and \(X=3\)
For \(X=2\)
| y | 1 | 2 | 3 |
|---|---|---|---|
| \(p_2(y|2)\) | 1 | 0 | 0 |
For \(X=3\)
| y | 1 | 2 | 3 |
|---|---|---|---|
| \(p_2(y|3)\) | 1/4 | 2/4 | 1/4 |
MS 6.12 - pg 244 Distribution of low bids. The
Department of Transportation (DOT) monitors sealed bids for new road
construction. For new access roads in a certain state, let X = low bid
(thousands of dollars) and let Y = DOT estimate of fair cost of building
the road (thousands of dollars). The joint probability density of X and
Y is \[f(x, y) =
\frac{e^{-y/10}}{10y},\hspace{8 mm} 0 < y < x < 2y\]
Find f(y), the marginal density function for Y. Do you recognize this distribution? \[f_2(y)=\int_{y}^{2y}f(x,y)dx=\int_{y}^{2y}\frac{e^{-y/10}}{10y}dx\] \[=\frac{e^{-y/10}}{10y}x \Big|_{y}^{2y}=\frac{e^{-y/10}}{10y}(2y-y)=\frac{e^{-y/10}}{10}\] This is an exponential distribution, \(\beta = 10\).
What is the mean DOT estimate, E(Y)?
\(E(Y)=\mu=\beta = 10\).
MS 6.14 - pg 245 Servicing an automobile. The joint
density of X, the total time (in minutes) between an automobile’s
arrival in the service queue and its leaving the system after servicing,
and Y, the time (in minutes) the car waits in the queue be- fore being
serviced, is \[f(x,y) =
\begin{cases}
ce^{-x^2} \quad \text{if } 0\leq y < x; 0 \leq x < \infty\\
0 \quad \quad \quad \text{elsewhere}
\end{cases}\]
Find the value of c that makes f(x, y) a probability density function. \[\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f(x,y)dydx = 1\] \[\int_{0}^{\infty}\int_{0}^{x}ce^{-x^2}dydx=\int_{0}^{\infty}ce^{-x^2}y\Big|_{0}^{x}dx = \int_{0}^{\infty}ce^{-x^2}x dx\] \[\int_{0}^{\infty}cxe^{-x^2}dx=-\frac{cxe^{-x^2}}{2}\Big|_{0}^{\infty}=\frac{c}{2}=1=>c=2\]
Find the marginal density for X and show that \[\int f_1(x)dx = 1\] \[f_1(x)=\int_{-\infty}^{\infty}f(x,y)dy=\int_{0}^{x}2e^{-x^2}dy=2e^{-x^2}y\Big|_{0}^{x}=2xe^{-x^2}\] \[\int_{-\infty}^{\infty}f_1(x)dx=\int_{0}^{\infty}2xe^{-x^2}dx=-e^{-x^2}\Big|_{0}^{\infty}=0+1=1\]
Show that the conditional density for Y given X is a uniform distribution over the interval \(0 \leq Y \leq X\). \[f_2(y|x)=\frac{f(x,y)}{f_1(x)}=\frac{2e^{-x^2}}{2xe^{-x^2}}=\frac{1}{x} \hspace{1cm} 0 \leq Y \leq X\]
MS 6.51 - pg 253 As an illustration of why the converse
of Theorem 6.6 is not true, consider the joint distribution of two
discrete random variables, X and Y, shown in the accompanying table.
Show that Cov(X, Y) = 0, but that X and Y are dependent.
knitr::include_graphics("Q11.png")
\[Cov(X,Y)=E(XY) \hspace{1 cm} E(X)E(Y)\] \[E(XY)=\sum_{x}\sum_{y}xyp(x,y)=(-1)(-1)(\frac{1}{12})+(-1)(0)(\frac{2}{12})+(-1)(1)(\frac{1}{12})\] \[+(0)(-1)(\frac{2}{12}+(0)(0)(0)+(0)(1)(\frac{2}{12})+(1)(-1)(\frac{1}{12})+(1)(\frac{2}{12})(0)+(1)(1)(\frac{1}{12})\] \[\frac{1}{12}+0-\frac{1}{12}+0+0+0-\frac{1}{12}+\frac{1}{12}=0\] Then \[P(X=1)=p_1(-1)=p(-1,-1)+p(-1,0)+p(-1,0)=\frac{1}{12}+\frac{2}{12}+\frac{1}{12}=\frac{4}{12}=\frac{1}{3}\] \[P(X=0)=p_1(0)=p(0,-1)+p(0,0)+p(0,0)=\frac{2}{12}+0+\frac{2}{12}=\frac{4}{12}=\frac{1}{3}\] \[P(X=1)=p_1(1)=p(1,-1)+p(1,0)+p(1,0)=\frac{1}{12}+\frac{2}{12}+\frac{1}{12}=\frac{4}{12}=\frac{1}{3}\] \[E(X)=\sum_{x}xp_1(x)=(-1)(\frac{1}{3})+(0)(\frac{1}{3})+(1)(\frac{1}{3})=0\] Same thing for \((Y=-1)\), \((Y=0)\), \((Y=1)\).
\[E(Y)=\sum_{x}yp_1(-1,1)=(-1)(\frac{1}{3})+(0)(\frac{1}{3})+(1)(\frac{1}{3})\]
\[Cov(X,Y)=E(XY)-E(X)E(Y)=0-0(0)=0\] \[p(x,y)=p_1(x)p_2(y)=> \frac{1}{12}\neq(\frac{1}{3})(\frac{1}{3})\] Thus, \(X\) and \(Y\) are not independent even though \(Cov(X,Y) = 0\).
MS 6.74 - pg 269 Uranium in the Earth’s crust. Refer to
the American Mineralogist (October 2009) study of the evolution of
uranium minerals in the Earth’s crust, Exercise 5.17 (p. 199). Recall
that researchers estimate that the trace amount of uranium Y in
reservoirs follows a uniform distribution ranging between 1 and 3 parts
per million. In a random sample of n = 60 reservoirs, let \(\overline{Y}\) represent the sample mean
amount of uranium.
Find \(E(\overline{Y})\) and interpret its value. \[E(\bar{Y})=E(\frac{\sum_{i=1}^{60} Y_i}{n})=\frac{1}{n}E(\sum_{i=1}^{60}Y_i)=\frac{1}{60}[60(2)]=2\]
Find Var \(\overline{Y}\). \[\sigma^2_{\bar{Y}}=V(\bar{Y})=V(\frac{\sum_{i=1}^{60} Y_i}{n})=(\frac{1}{n})^2V({\sum_{i=1}^{60} Y_i})\]
sigmaSquared = (1/60^2)*(60/3)
sigmaSquared
## [1] 0.005555556
Describe the shape of the sampling distribution of \(\overline{Y}\). By the central Limit Theorem, the sampling distribution of \(\bar{Y}\) is approximately normal.
Find the probability that \(\overline{Y}\) is between 1.5 ppm and 2.5 ppm.
sigma = sqrt(sigmaSquared)
sigma
## [1] 0.0745356
\[P(1.5 \leq \bar{Y} \leq 2.5)=P(\frac{1.5-2}{0.0745} \leq Z \leq \frac{2.5-2}{0.0745})=P(-6.71 \leq Z \leq 6.71)\] \[=P(0 \leq Z \leq 6.71)+P(0 \leq Z \leq 6.71) \approx 0.5+0.5=1\]
MS 6.90 - pg 273 Mercury contamination of swordfish.
Consumer Reports found widespread contamination of seafood in New York
and Chicago supermarkets. For example, 40% of the swordfish pieces
available for sale have a level of mercury above the Food and Drug
Administration (FDA) limit. Consider a random sample of 20 swordfish
pieces from New York and Chicago supermarkets.
Use the normal approximation to the binomial to calculate the probability that fewer than 2 of the 20 swordfish pieces have mercury levels exceeding the FDA limit. \[P(Y<2)=P(Z < \frac{1.5-8}{2.1909})=P(Z<2.97)=0.5-P(-2.97 < Z <0)=0.5 -0.4985=0.0015\]
Use the normal approximation to the binomial to calculate the probability that more than half of the 20 swordfish pieces have mercury levels exceeding the FDA limit. \[P(Y>10)=P(Z>\frac{10.5-8}{2.1909})=P(Z>1.14)=0.5-P(0<Z<1.14)=0.5-0.3729 = 0.1271\]
Use the binomial tables to calculate the exact probabilities in parts a and b. Does the normal distribution provide a good approximation to the binomial distribution? \[P(Y<2)=P(Y<1)=0.0005\] \[P(Y>10)-1-P(Y \leq 10)-1-0.8725-0.1275\] The normal approximation provides a good estimate of the binomial distribution.
MS 7.108 - pg 362 Lead and copper in drinking water.
Periodically, the Hillsborough County (Florida) Water Department tests
the drinking water of homeowners for contaminants such as lead and
copper. The lead and copper levels in water specimens collected for a
sample of 10 residents of the Crystal Lake Manors subdivision are shown
next.
knitr::include_graphics("Q14.png")
v = c(1.32,0,13.1,.919,.657,3.0,1.32,4.09,4.45,0)
mu = mean(v)
mu
## [1] 2.8856
sigma = sqrt(var(v))
sigma
## [1] 3.924775
# Range Estimate
mu-sigma
## [1] -1.039175
mu+sigma
## [1] 6.810375
(-1.15, 6.92)
Construct a 99% confidence interval for the mean cop- per level in water specimens from Crystal Lake Manors. (0.1519, 0.6647)
Interpret the intervals, parts a and b, in the words of the problem. We are 99% that the mean lead level in water specimens from Crystal Lake Manors is between -1.5 and 6.92. Since the lead level cannot be negative, we are 99% confident that the mean lead level in water specimens from Crystal Lake Manors is between 0 and 6.92
Discuss the meaning of the phrase, “99% confident.” 99% confidence means that in repeated sampling, 99% of all intervals constructed in a similar manner will contain the true population mean.
MS 7.114 - pg 364 Solar irradiation study. The Journal
of Environmental Engineering (Feb. 1986) reported on a heat transfer
model designed to predict winter heat loss in wastewater treatment
clarifiers. The analysis involved a comparison of clear-sky solar
irradiation for horizontal surfaces at different sites in the Midwest.
The day-long solar irradiation levels (in BTU/sq. ft.) at two midwestern
locations of different latitudes (St. Joseph, Missouri, and Iowa Great
Lakes) were recorded on each of seven clear-sky winter days. The data
are given in the table. Find a 95% confidence interval for the mean
difference between the day-long clear-sky solar irradiation levels at
the two sites. Interpret the results.
knitr::include_graphics("Q15.png")
For confidence coefficient 0.95, \(\alpha\) = 0.05 and \(\alpha\)/2 = 0.05/2 = 0.025. v = n-1 = 7-1 = 6 degrees of freeedom, \(t_{0.025}\) = 2.447. The 95% confidence interval is: \[\bar{d} \pm t_{\alpha/2}\frac{s_4}{\sqrt{n}} =>198.0 \pm 2.447 \frac{44.5}{\sqrt{7}} => 19.0 \pm 41.16 => (156.84, 239.16)\] We are 95% confident that the true mean difference between the day-long clear-sky solar irradiation levels at the two sites is between 156.84 and 239.16.
MS 7.116 - pg 364 Diazinon residue in orchards.
Pesticides applied to an extensively grown crop can result in
inadvertent areawide air contamination. Environmental Science &
Technology (Oct. 1993) reported on air deposition residues of the
insecticide diazinon used on dormant orchards in the San Joaquin Valley,
California. Ambient air samples were collected and analyzed at an
orchard site for each of 11 days during the most intensive period of
spraying. The levels of diazinon residue (in mg/m3) during the day and
at night are recorded in the table. The researchers want to know whether
the mean diazinon residue levels differ from day to night.
knitr::include_graphics("Q16.png")
For confidence coefficient 0.90, \(\alpha\) = 0.10 and \(\alpha\)/2 = 0.10/2 = 0.05. v = n-1 = 11-1 = 10 degrees of freedom, \(t_{0.05}=1.812\). The 90% confidence interval is: \[\bar{d} \pm t_{\alpha/2}\frac{s_4}{\sqrt{n}} =>38.9 \pm 1.812 \frac{36.5799}{\sqrt{11}} => -38.909 \pm 19.985 => (-58.894, -18.934)\]We are 95% confident that the true mean difference between the day-long clear-sky solar irradiation levels at the two sites is between -58.894 and -18.924.
What assumptions are necessary for the validity of the interval estimation procedure of part a? We must assume the population of differences is normal.
Use the interval, part a, to answer the researchers’ question. Since the confidence interval in part a does not contain 0, there is evidence of a difference in the mean diazinon residue between day and night. Since the interval contains only negative number, the mean diazinon residue for night is greater than the mean diazinon residue for day.